76 research outputs found

    Constructing ensembles for intrinsically disordered proteins

    Get PDF
    The relatively flat energy landscapes associated with intrinsically disordered proteins makes modeling these systems especially problematic. A comprehensive model for these proteins requires one to build an ensemble consisting of a finite collection of structures, and their corresponding relative stabilities, which adequately capture the range of accessible states of the protein. In this regard, methods that use computational techniques to interpret experimental data in terms of such ensembles are an essential part of the modeling process. In this review, we critically assess the advantages and limitations of current techniques and discuss new methods for the validation of these ensembles

    Deep Metric Learning for the Hemodynamics Inference with Electrocardiogram Signals

    Full text link
    Heart failure is a debilitating condition that affects millions of people worldwide and has a significant impact on their quality of life and mortality rates. An objective assessment of cardiac pressures remains an important method for the diagnosis and treatment prognostication for patients with heart failure. Although cardiac catheterization is the gold standard for estimating central hemodynamic pressures, it is an invasive procedure that carries inherent risks, making it a potentially dangerous procedure for some patients. Approaches that leverage non-invasive signals - such as electrocardiogram (ECG) - have the promise to make the routine estimation of cardiac pressures feasible in both inpatient and outpatient settings. Prior models trained to estimate intracardiac pressures (e.g., mean pulmonary capillary wedge pressure (mPCWP)) in a supervised fashion have shown good discriminatory ability but have been limited to the labeled dataset from the heart failure cohort. To address this issue and build a robust representation, we apply deep metric learning (DML) and propose a novel self-supervised DML with distance-based mining that improves the performance of a model with limited labels. We use a dataset that contains over 5.4 million ECGs without concomitant central pressure labels to pre-train a self-supervised DML model which showed improved classification of elevated mPCWP compared to self-supervised contrastive baselines. Additionally, the supervised DML model that uses ECGs with access to 8,172 mPCWP labels demonstrated significantly better performance on the mPCWP regression task compared to the supervised baseline. Moreover, our data suggest that DML yields models that are performant across patient subgroups, even when some patient subgroups are under-represented in the dataset. Our code is available at https://github.com/mandiehyewon/ssldm

    Intrinsically Disordered Proteins: Where Computation Meets Experiment

    Get PDF
    Proteins are heteropolymers that play important roles in virtually every biological reaction. While many proteins have well-defined three-dimensional structures that are inextricably coupled to their function, intrinsically disordered proteins (IDPs) do not have a well-defined structure, and it is this lack of structure that facilitates their function. As many IDPs are involved in essential cellular processes, various diseases have been linked to their malfunction, thereby making them important drug targets. In this review we discuss methods for studying IDPs and provide examples of how computational methods can improve our understanding of IDPs. We focus on two intensely studied IDPs that have been implicated in very different pathologic pathways. The first, p53, has been linked to over 50% of human cancers, and the second, Amyloid-β (Aβ), forms neurotoxic aggregates in the brains of patients with Alzheimer’s disease. We use these representative proteins to illustrate some of the challenges associated with studying IDPs and demonstrate how computational tools can be fruitfully applied to arrive at a more comprehensive understanding of these fascinating heteropolymers.National Science Foundation (U.S.). Directorate for Biological Sciences. Postdoctoral Research Fellowship (Grant 1309247

    A Structure-free Method for Quantifying Conformational Flexibility in proteins

    Get PDF
    All proteins sample a range of conformations at physiologic temperatures and this inherent flexibility enables them to carry out their prescribed functions. A comprehensive understanding of protein function therefore entails a characterization of protein flexibility. Here we describe a novel approach for quantifying a protein’s flexibility in solution using small-angle X-ray scattering (SAXS) data. The method calculates an effective entropy that quantifies the diversity of radii of gyration that a protein can adopt in solution and does not require the explicit generation of structural ensembles to garner insights into protein flexibility. Application of this structure-free approach to over 200 experimental datasets demonstrates that the methodology can quantify a protein’s disorder as well as the effects of ligand binding on protein flexibility. Such quantitative descriptions of protein flexibility form the basis of a rigorous taxonomy for the description and classification of protein structure.Massachusetts Institute of Technology (Steve G. and Renee Finn Faculty Innovation Fellowship)Swiss National Science Foundation (Early Postdoc.Mobility Fellowship

    Comparative Studies of Disordered Proteins with Similar Sequences: Application to Aβ40 and Aβ42

    Get PDF
    Quantitative comparisons of intrinsically disordered proteins (IDPs) with similar sequences, such as mutant forms of the same protein, may provide insights into IDP aggregation—a process that plays a role in several neurodegenerative disorders. Here we describe an approach for modeling IDPs with similar sequences that simplifies the comparison of the ensembles by utilizing a single library of structures. The relative population weights of the structures are estimated using a Bayesian formalism, which provides measures of uncertainty in the resulting ensembles. We applied this approach to the comparison of ensembles for Aβ40 and Aβ42. Bayesian hypothesis testing finds that although both Aβ species sample β-rich conformations in solution that may represent prefibrillar intermediates, the probability that Aβ42 samples these prefibrillar states is roughly an order of magnitude larger than the frequency in which Aβ40 samples such structures. Moreover, the structure of the soluble prefibrillar state in our ensembles is similar to the experimentally determined structure of Aβ that has been implicated as an intermediate in the aggregation pathway. Overall, our approach for comparative studies of IDPs with similar sequences provides a platform for future studies on the effect of mutations on the structure and function of disordered proteins

    Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series

    Full text link
    Self-supervised learning (SSL) for clinical time series data has received significant attention in recent literature, since these data are highly rich and provide important information about a patient's physiological state. However, most existing SSL methods for clinical time series are limited in that they are designed for unimodal time series, such as a sequence of structured features (e.g., lab values and vitals signs) or an individual high-dimensional physiological signal (e.g., an electrocardiogram). These existing methods cannot be readily extended to model time series that exhibit multimodality, with structured features and high-dimensional data being recorded at each timestep in the sequence. In this work, we address this gap and propose a new SSL method -- Sequential Multi-Dimensional SSL -- where a SSL loss is applied both at the level of the entire sequence and at the level of the individual high-dimensional data points in the sequence in order to better capture information at both scales. Our strategy is agnostic to the specific form of loss function used at each level -- it can be contrastive, as in SimCLR, or non-contrastive, as in VICReg. We evaluate our method on two real-world clinical datasets, where the time series contains sequences of (1) high-frequency electrocardiograms and (2) structured data from lab values and vitals signs. Our experimental results indicate that pre-training with our method and then fine-tuning on downstream tasks improves performance over baselines on both datasets, and in several settings, can lead to improvements across different self-supervised loss functions.Comment: ICML 202

    Hidden States within Disordered Regions of the CcdA Antitoxin Protein

    Get PDF
    The bacterial toxin–antitoxin system CcdB–CcdA provides a mechanism for the control of cell death and quiescence. The antitoxin protein CcdA is a homodimer composed of two monomers that each contain a folded N-terminal region and an intrinsically disordered C-terminal arm. Binding of the intrinsically disordered C-terminal arm of CcdA to the toxin CcdB prevents CcdB from inhibiting DNA gyrase and thereby averts cell death. Accurate models of the unfolded state of the partially disordered CcdA antitoxin can therefore provide insight into general mechanisms whereby protein disorder regulates events that are crucial to cell survival. Previous structural studies were able to model only two of three distinct structural states, a closed state and an open state, that are adopted by the C-terminal arm of CcdA. Using a combination of free energy simulations, single-pair Förster resonance energy transfer experiments, and existing NMR data, we developed structural models for all three states of the protein. Contrary to prior studies, we find that CcdA samples a previously unknown state where only one of the disordered C-terminal arms makes extensive contacts with the folded N-terminal domain. Moreover, our data suggest that previously unobserved conformational states play a role in regulating antitoxin concentrations and the activity of CcdA’s cognate toxin. These data demonstrate that intrinsic disorder in CcdA provides a mechanism for regulating cell fate

    The Effect of a ΔK280 Mutation on the Unfolded State of a Microtubule-Binding Repeat in Tau

    Get PDF
    Tau is a natively unfolded protein that forms intracellular aggregates in the brains of patients with Alzheimer's disease. To decipher the mechanism underlying the formation of tau aggregates, we developed a novel approach for constructing models of natively unfolded proteins. The method, energy-minima mapping and weighting (EMW), samples local energy minima of subsequences within a natively unfolded protein and then constructs ensembles from these energetically favorable conformations that are consistent with a given set of experimental data. A unique feature of the method is that it does not strive to generate a single ensemble that represents the unfolded state. Instead we construct a number of candidate ensembles, each of which agrees with a given set of experimental constraints, and focus our analysis on local structural features that are present in all of the independently generated ensembles. Using EMW we generated ensembles that are consistent with chemical shift measurements obtained on tau constructs. Thirty models were constructed for the second microtubule binding repeat (MTBR2) in wild-type (WT) tau and a ΔK280 mutant, which is found in some forms of frontotemporal dementia. By focusing on structural features that are preserved across all ensembles, we find that the aggregation-initiating sequence, PHF6*, prefers an extended conformation in both the WT and ΔK280 sequences. In addition, we find that residue K280 can adopt a loop/turn conformation in WT MTBR2 and that deletion of this residue, which can adopt nonextended states, leads to an increase in locally extended conformations near the C-terminus of PHF6*. As an increased preference for extended states near the C-terminus of PHF6* may facilitate the propagation of β-structure downstream from PHF6*, these results explain how a deletion at position 280 can promote the formation of tau aggregates
    corecore